Additional use of phoneme duration hypotheses in automatic speech segmentation

نویسندگان

  • Karlheinz Stöber
  • Wolfgang Hess
چکیده

In this paper, we describe a new approach for speaker independent automatic phoneme alignment. Typical algorithms for this task use only phoneme-to-frame similarity measures which are somehow maximised or minimised. In addition to such similarity measures, we use phoneme duration hypotheses generated by the speech synthesis system HADIFIX [1]. For algorithms based on dynamic programming, it is difficult to use these duration hypotheses, so we create a cost-function consisting of phoneme-to-frame and segment-to-duration hypotheses similarity measures and minimise this cost-function by a Genetic Algorithm. The results show that the accuracy of automatically determined phoneme boundaries increases. This accounts especially for speakers not used in the training phase.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM

Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...

متن کامل

Improved HMM/SVM methods for automatic phoneme segmentation

This paper presents improved HMM/SVM methods for a twostage phoneme segmentation framework, which tries to imitate the human phoneme segmentation process. The first stage performs hidden Markov model (HMM) forced alignment according to the minimum boundary error (MBE) criterion. The objective is to align a phoneme sequence of a speech utterance with its acoustic signal counterpart based on MBE-...

متن کامل

Statistical corpus-based speech segmentation

An automatic speech segmentation technique is presented that is based on the alignment of a target speech signal with a set of different reference speech signals generated by a specific designed corpus-based speech synthesis system that additionally generates phoneme boundary markers. Each reference signal is then warped to the target speech signal. By synthesizing and warping many different re...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Automatic Speech Segmentation Based on HMM

This contribution deals with the problem of automatic phoneme segmentation using HMMs. Automatization of speech segmentation task is important for applications, where large amount of data is needed to process, so manual segmentation is out of the question. In this paper we focus on automatic segmentation of recordings, which will be used for triphone synthesis unit database creation. For speech...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998